Recognizing Hand-Printed Letters and Digits

نویسندگان

  • Gale Martin
  • James A. Pittman
چکیده

We are developing a hand-printed character recognition system using a multilayered neural net trained through backpropagation. We report on results of training nets with samples of hand-printed digits scanned off of bank checks and hand-printed letters interactively entered into a computer through a stylus digitizer. Given a large training set, and a net with sufficient capacity to achieve high performance on the training set, nets typically achieved error rates of 4-5% at a 0% reject rate and 1-2% at a 10% reject rate. The topology and capacity of the system, as measured by the number of connections in the net, have surprisingly little effect on generalization. For those developing practical pattern recognition systems, these results suggest that a large and representative training sample may be the single, most important factor in achieving high recognition accuracy. From a scientific standpoint, these results raise doubts about the relevance to backpropagation of learning models that estimate the likelihood of high generalization from estimates of capacity. Reducing capacity does have other benefits however, especially when the reduction is accomplished by using local receptive fields with shared weights. In this latter case, we find the net evolves feature detectors resembling those in visual cortex and Linsker's orientation-selective nodes. Practical interest in hand-printed character recognition is fueled by two current technology trends: one toward systems that interpret hand-printing on hard-copy documents and one toward notebook-like computers that replace the keyboard with a stylus digitizer. The stylus enables users to write and draw directly on a flat panel display. In this paper, we report on results applying multi-layered neural nets trained through backpropagation (Rumelhart, Hinton, & Williams, 1986) to both cases. Developing pattern recognition systems is typically a two-stage process. First, intuition and experimentation are used to select a set of features to represent the raw input pattern. Then a variety of well-developed techniques are used to optimize the classifier system that assumes this featural representation. Most applications of backpropagation learning to character recognition use the learning capabilities only for this latter 406 Martin and Pittman stage--developing the classifier system (Burr, 1986; Denker, Gardner, Graf, Henderson, Howard, Hubbard, Jackel, Baird, & Guyon, 1989; Mori & Yokosawa, 1989; Weideman, Manry, & Yau, 1989). However, backpropagation learning affords the opportunity to optimize feature selection and pattern classification simultaneously. We avoid using pre-determined features as input to the net in favor of using a presegmented, size-normalized grayscale array for each character. This is a first step toward the goal of approximating the raw input projected onto the human retina, in that no pre-processing of the input is required. We report on results for both hand-printed digits and letters. The hand-printed digits come from a set of 40,000 hand-printed digits scanned from the numeric amount region of "real-world" bank checks. They were pre-segmented and size-normalized to a 15x24 grayscale array. The test set consists of 4,000 samples and training sets varied from 100 to 35,200 samples. Although it is always difficult to compare recognition rates arising from different pattern sets, some appreciation for the difficulty of categorization can be gained using human performance data as a benchmark. An independent person categorizing the test set of pre-segmented, size-normalized digits achieved an error rate of 3.4%. This figure is considerably below the near-perfect performance of operators keying in numbers directly from bank checks, because the segmentation algorithm is flawed. Working with letters, as well as digits, enables tests of the generality of results on a different pattern set having more than double the number of output categories. The hand-printed letters come from a set of 8,600 upper-case letters collected from over 110 people writing with a stylus input device on a flat panel display. The stylUS collects a sequence of x-y coordinates at 200 points per second at a spatial resolution of 1000 points per inch. The temporal sequence for each character is first converted to a sizenormalized bitmap array, keeping aspect ratio constant. We have found that recognition accuracy is significantly improved if these bitmaps are blurred through convolution with a gaussian distnbution. Each pattern is represented as a 15x24 grayscale image. A test set of 2,368 samples was extracted by selecting samples from 18 people, so that training sets were generated by people different from those generating the test set. Training set sizes ranged from 500 to roughly 6,300 samples. 1 HIGH RECOGNITION ACCURACY We find relatively high recognition accuracy for both pattern sets. Thble 11 reports the minimal error rates achieved on the test samples for both pattern sets, at various reject rates. In the case of the hand-printed digits, the 4% error rate (0% rejects) ap1. Eff~cts of the number.of training samples and network capacity and topology are reported in the next sectIon. Nets were tramed to error rates of 2-3%. 1i"aining began with a learning rate of .05 and a mome~tum value of .9. The learning rate was decreased when training accuracy began to oscillate or had stabtlized for a large number of training epochs. We evaluate the output vector on a winner-takeall basis, as this consistently improves accuracy and results in network parameters having a smaller effect on perfonnance. Recognizing Hand-Printed Letters and Digits 407 proaches the 3.4% errors made by the human judge. This suggests that further improvements to generalization will require improving segmentation accuracy. The fact that an error rate of 5% was achieved for letters is promising. Accuracy is fairly high, Table 1: Error rates of best nets trained on largest sample sets and tested on new samples REJECT RATE DIGITS LETTERS 0% 5% 10% 35% 4% 3% 1% .001% 5% 3% 2% .003% even though there are a large number of categories (26). This error rate may be adequate for applications where contextual constraints can be used to significantly boost accuracy at the word-level. 2 MINIMAL NETWORK CAPACITY AND TOPOLOGY EFFECTS The effects of network parameters on generalization have both practical and scientific significance. The practical developer of pattern recognition systems is interested in such effects to determine whether limited resources should be spent on trying to optimize network parameters or on collecting a larger, more representative training set. For the scientist, effects of capacity bear on the relevance of learning models to backpropagation. A central premise of most general models of learning-by-example is that the size of the initial search space-the capacity of the system-determines the number of training samples needed to achieve high generalization performance. Learning is conceptualized as a search for a function that maps all possible inputs to their correct outputs. Learning occurs by comparing successive samples of input-output pairs to functions in a search space. Functions inconsistent with training samples are rejected. Very large training sets narrow the search down to a function that closely approximates the desired function and yields high generalization. The capacity of a learning system-the number of functions it can represent--determines generalization, since a larger initial search space requires more training samples to narrow the search sufficiently . This suggests that to improve generalization, capacity should be minimized. Unfortunately, it is typically unclear how to minimize capacity without eliminating the desired function from the search space. A heuristic, which is often suggested, is that simple is usually better. It receives support from experience in curve fitting. Low-order polynomials typically extrapolate and interpolate better than high-order polynomials (Duda & Hart, 1973). Extensions of the heuristic to neural net learning propose reducing capacity by reducing the number of connections or the number of bits used to represent each connection 408 Martin and Pittman weight (Baum & Haussler, 1989; Denker, Schwartz, Wittner, Solla, Howard, Jackel, & Hopfield,1987). We manipulated the capacity of nets in a number of ways: 1) varying the number of hidden nodes, 2) limiting connectivity between layers so that nodes received input from only local areas, and 3) sharing connection weights between hidden nodes. We found only negligible effects on generalization. 2.1 NUMBER OF HIDDEN NODES Figure 1 presents generalization results as a function of training set size for nets having one hidden layer and varying numbers of hidden nodes. The number of free parameters (i.e., number of connections and biases) in each case is presented in parentheses. Despite considerable variation in the number of free parameters, using nets with fewer hidden nodes did not improve generalization. Baum & Haussler (1989) estimate the number of training samples required to achieve an error rate e (where 0 < e ~ 1/8) on the generalization test, when an error rate of el2 has been achieved on the training set. They assume a feed-forward net with one hidden layer and W connections. The estimates are distribution-free in the sense that calculations assume an arbitrary to-be-learned function. If the number of training samples is of order : log ~ ,where N refers to the number of nodes, then it is a near certainty that the net will achieve generalization rates of (1 e). This estimate is the upper-bound on the number of training samples needed. They also provide a lower Digits Letters 100 100 , ./ .;/ ,';/ I ~ I ~ , t::: 75 75 I 0 I u ~ Number of Hidden Nodes / Number of Hidden Nodes 50 (18,560) , 170 (63,080) 170 (65,816) 383 (142,103) 365 (141,281) 50 50 10

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Modfied Self-organizing Map Neural Network to Recognize Multi-font Printed Persian Numerals (RESEARCH NOTE)

This paper proposes a new method to distinguish the printed digits, regardless of font and size, using neural networks.Unlike our proposed method, existing neural network based techniques are only able to recognize the trained fonts. These methods need a large database containing digits in various fonts. New fonts are often introduced to the public, which may not be truly recognized by the Opti...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

Recognizing Hand-Printed Digits with a Distance Quasi-Metric

A distance quasi-metric for pattern recognition is presented. The “quasi” modifier distinguishes the metric from “true” distance metrics which obey a set of standard constraints. By relaxing one of the constraints, and coupling it with a fast multi-dimensional search technique, the metric demonstrates improved accuracy and efficiency compared to other metrics in recognizing hand-written digit s...

متن کامل

What differs in visual recognition of handwritten vs. printed letters? An fMRI study.

In models of letter recognition, handwritten letters are considered as a particular font exemplar, not qualitatively different in their processing from printed letters. Yet, some data suggest that recognizing handwritten letters might rely on distinct processes, possibly related to motor knowledge. We applied functional magnetic resonance imaging to compare the neural correlates of perceiving h...

متن کامل

Idiap Recognition of Handprinted Digits 1 Using Optimal Bounded Error Matching

This paper describes a system that recognizes hand-printed digits. The system is based on optimal bounded error matching, a technique already in common use in general-purpose 2D and 3D visual object recognition systems in cluttered, noisy scenes. In this paper, we demonstrate that the same techniques achieve high recognition rates (up to 99.2%) on real-world data (the NIST database of hand-prin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1989